{
 "metadata": {
  "language": "Julia",
  "name": "",
  "signature": "sha256:24922c86884755544aa7c056387574329f22e923c62395d2349676f46e2602b4"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Time Series Analysis\n",
      "\n",
      "- A sequence of data points, each associated with a time\n",
      "- $x$ is the vector of the time series \n",
      "- $x_i$ denotes the temperature on day $i$"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "using Gadfly, Convex, SCS\n",
      "temps = readdlm(\"melbourne_temps.txt\", ',')\n",
      "n = size(temps)[1]\n",
      "p = plot(\n",
      "  x=1:1500, y=temps[1:1500], Geom.line,\n",
      "  Theme(panel_fill=color(\"white\"))\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# What is the mean of the time series?\n",
      "mean_temperature = mean(temps)\n",
      "@show mean_temperature \n",
      "\n",
      "# What about the RMS?\n",
      "rms = sqrt(sum(x -> x*x, temps-mean_temperature) / length(temps))\n",
      "@show rms"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### How do we lower the RMS error?\n",
      "\n",
      "- Find a smooth curve that approximates the yearly ups and downs\n",
      "- Represent this as vector $s$ where $s_i$ denotes temperature on day $i$\n",
      "- Let's force a yearly trend\n",
      "    $$s_i == s_{i - 365}$$\n",
      "\n",
      "- Temperature on each day in our model should be relatively close to the actual temperature of that day\n",
      "- Model needs to be smooth - change in temperature from day to day should be relatively small\n",
      "\n",
      "$$\\sum_{i = 1}^n (s_i - x_i)^2 + \\lambda \\sum_{i = 2}^n(s_i - s_{i - 1})^2$$\n",
      "\n",
      "where $\\lambda$ is the smoothing parameter. The larger $\\lambda$ is, the smoother our model will be."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Create a variable \n",
      "yearly = Variable(n)\n",
      "\n",
      "# Let's add the equality constraints\n",
      "eq_constraints = []\n",
      "for i in 365 + 1 : n\n",
      "    # Our predicted temperature for today is same as one year ago\n",
      "    eq_constraints += yearly[i] == yearly[i - 365]\n",
      "end\n",
      "\n",
      "smoothing_param = 100\n",
      "# Change in temperature from day to day should be relatively small\n",
      "smooth_objective = smoothing_param * \n",
      "                   sum_squares(yearly[1 : n - 1] - yearly[2 : n])\n",
      "\n",
      "# Construct the problem\n",
      "problem = minimize(sum_squares(temps - yearly) + smooth_objective, \n",
      "                   eq_constraints)\n",
      "\n",
      "# Solve the problem\n",
      "solve!(problem, SCSSolver(max_iters=5000, verbose=0))\n",
      "\n",
      "# Plot smooth fit\n",
      "p = plot( \n",
      "    # Red line represents predicted model    \n",
      "    layer(x=1:1500, y=evaluate(yearly)[1:1500], Geom.line, \n",
      "          Theme(default_color=color(\"red\"), line_width=2px)),\n",
      "    # Blue line represents actual temperatures\n",
      "    layer(x=1:1500, y=temps[1:1500], Geom.line),\n",
      "    Theme(panel_fill=color(\"white\"))\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Calculate the residuals \n",
      "residuals = temps - evaluate(yearly)\n",
      "\n",
      "# Plot residuals for a few days\n",
      "p = plot(\n",
      "    x=1:100, y=residuals[1:100], Geom.line,\n",
      "    Theme(default_color=color(\"green\"), panel_fill=color(\"white\"))\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "rms = sqrt(sum(x -> x*x, residuals) / length(residuals))\n",
      "@show rms"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- Hypothesis: residual temperature on a given day is some linear combination of the previous $5$ days (autoregressive model)\n",
      "- Essentially trying to fit the residuals as a function of other parts of the data itself\n",
      "- We want to find a vector of coefficients $a$ such that\n",
      "\n",
      "$$\\mbox{r}(i) \\approx \\sum_{j = 1}^5 a_j \\mbox{r}(i - j)$$\n",
      "\n",
      "This can be done by simply minimizing the following sum of squares objective\n",
      "\n",
      "$$\\sum_{i = 6}^n \\left(\\mbox{r}(i) - \\sum_{j = 1}^5 a_j \\mbox{r}(i - j)\\right)^2$$"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Generate the residuals matrix\n",
      "ar_len = 5\n",
      "residuals_mat = residuals[ar_len : n - 1]\n",
      "for i = 1:ar_len - 1\n",
      "  residuals_mat = [residuals_mat residuals[ar_len - i : n - i - 1]]\n",
      "end\n",
      "\n",
      "# Solve autoregressive problem\n",
      "ar_coef = Variable(ar_len)\n",
      "problem = minimize(sum_squares(residuals[ar_len + 1 : end] - \n",
      "                               residuals_mat * ar_coef))\n",
      "solve!(problem, SCSSolver(max_iters=5000, verbose=0))\n",
      "\n",
      "# plot autoregressive fit of daily fluctuations for a few days\n",
      "ar_range = 1:145\n",
      "day_range = ar_range + ar_len\n",
      "p = plot(\n",
      "  layer(x=day_range, y=residuals[day_range], Geom.line, Theme(default_color=color(\"green\"))),\n",
      "  layer(x=day_range, y=residuals_mat[ar_range, :] * evaluate(ar_coef), Geom.line, Theme(default_color=color(\"red\"))),\n",
      "  Theme(panel_fill=color(\"white\"))\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, we can add our autoregressive model for the residual temperatures to our smooth model to get an better fitting model for the daily temperatures in the city of Melbourne:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "total_estimate = evaluate(yearly)\n",
      "total_estimate[ar_len + 1 : end] += residuals_mat * evaluate(ar_coef)\n",
      "\n",
      "# plot final fit of data\n",
      "p = plot(\n",
      "  layer(x=1:1500, y=total_estimate[1:1500], Geom.line, Theme(default_color=color(\"red\"))),\n",
      "  layer(x=1:1500, y=temps[1:1500], Geom.line),\n",
      "  Theme(panel_fill=color(\"white\"))\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The RMS error of this final model is $2.3$."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}